#acl +All:read Default
= Information Extraction from Polish free text =

== Project factsheet ==

|| Polish name:          || Opracowanie narzędzi do ekstrakcji informacji z tekstów w języku polskim ||
|| Project type:         || A national [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] research grant (number 3T11C00727) ||
|| Duration:             || 20 October 2004 ‒ 19 October 2007 ||
|| Principal investigator: || Agnieszka Mykowiecka ||
|| Institution:          || Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences ||


== Project description ==

Motivations:
 * not many efforts on IE on Polish texts in contrast to many existing applications for many languages,
 * existing IE tools could not be directly used for processing Polish.

Goals:
 * adapting chosen IE tools for processing Polish,
 * collecting some linguistic resources for IE.

Activities:
 * adapting IE platforms SProUT and (recently) GATE for tokenization and morphological analysis of Polish texts,
 * collecting resourses and IE grammars for named entities recognition (NER) in Polish texts,
 * ruled based IE experiments in a selected domain (medical texts),
 * testing methods of terminology extraction on Polish data.